Search CORE

328 research outputs found

Balancing lists: a proof pearl

Author: C. Okasaki
E.W. Myers
L.J. Guibas
Publication venue
Publication date: 01/01/2014
Field of study

Starting with an algorithm to turn lists into full trees which uses non-obvious invariants and partial functions, we progressively encode the invariants in the types of the data, removing most of the burden of a correctness proof. The invariants are encoded using non-uniform inductive types which parallel numerical representations in a style advertised by Okasaki, and a small amount of dependent types.Comment: To appear in proceedings of Interactive Theorem Proving (2014

arXiv.org e-Print Archive

Crossref

HAL AMU

INRIA a CCSD electronic archive server

An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs

Author: D.R. Zerbino
E.S. Lander
E.W. Myers
J. Craig Venter
P. Medvedev
P.A. Pevzner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2 log2(|V |)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a ?(p(|V | + |E|) log(|V |) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax = max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the bidirected flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V | lies between 0.08% and 0.13% with 95% probability

arXiv.org e-Print Archive

Crossref

Cerulean: A hybrid assembly using high throughput short and long reads

Author: D.R. Zerbino
E.W. Myers
E.W. Myers
F.J. Ribeiro
J. Schmutz
J.T. Simpson
J.T. Simpson
M. Eisenstein
M.J. Chaisson
M.J. Chaisson
P.A. Pevzner
R. Staden
R.M. Idury
S. Koren
T.D. Wu
Publication venue
Publication date: 01/01/2013
Field of study

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Crossref

Simple Formula for Nuclear Charge Radius

Author: B. Nerlo-Pomorska
Bozena Nerlo-Pomorska
E.W. Otten
H. Vries de
Krzysztof Pomorski
W.D. Myers
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

A new formula for the nuclear charge radius is proposed, dependent on the mass number (A) and neutron excess (N-Z) in the nucleus. It is simple and it reproduces all the experimentally available mean square radii and their isotopic shifts of even--even nuclei much better than other frequently used relations.Comment: The paper contains 7 pages in LateX and 6 figures (available upon request) in postscript. Email: [email protected]

arXiv.org e-Print Archive

Crossref

CERN Document Server

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

DooSo6: Easy Collaboration over Shared Projects

Author: A. Grasso
B. Collins-Sussman
E.W. Myers
P. Dourish
S. Noël
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceExisting tools for supporting parallel work feature some disadvantages that prevent them to be widely used. Very often they require a complex installation and creation of accounts for all group members. Users need to learn and deal with complex commands for efficiently using these collaborative tools. Some tools require users to abandon their favourite editors and impose them to use a certain co-authorship application. In this paper, we propose the DooSo6 collaboration tool that offers support for parallel work, requires no installation, no creation of accounts and that is easy to use, users being able to continue working with their favourite editors. User authentication is achieved by means of a capability-based mechanism

Crossref

INRIA a CCSD electronic archive server

Faster Approximate String Matching for Short Patterns

Author: A. Andersson
A.H. Wright
D. Gusfield
D. Harel
D.E. Knuth
E. Ukkonen
E. Ukkonen
E.W. Myers
F.T. Leighton
G. Myers
G. Navarro
G.M. Landau
H. Hyyrö
K.E. Batcher
M. Farach-Colton
M.A. Bender
P. Bille
P. Sellers
Philip Bille
R. Baeza-Yates
R. Cole
R.A. Baeza-Yates
R.A. Wagner
S. Albers
S. Alstrup
S. Wu
S.C. Sahinalp
T. Hagerup
T.H. Cormen
V.L. Arlazarov
W. Masek
Z. Galil
Z. Galil
Publication venue
Publication date: 17/03/2011
Field of study

We study the classical approximate string matching problem, that is, given strings

P

and

Q

and an error threshold

k

, find all ending positions of substrings of

Q

whose edit distance to

P

is at most

k

. Let

P

and

Q

have lengths

m

and

n

, respectively. On a standard unit-cost word RAM with word size

w \geq \log n

we present an algorithm using time

O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n)

When

P

is short, namely,

m = 2^{o(\sqrt{\log n})}

m = 2^{o(\sqrt{w/\log w})}

this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Can we avoid high coupling?

Author: A. Clauset
A. Koenig
A. Potanin
A.-L. Barabási
C. Alexander
C.R. Myers
D.L. Parnas
D.L. Parnas
E.F. Keller
E.W. Dijkstra
F.G. Wilkie
G. Concas
H.A. Simon
K.-I. Goh
L. Hatton
L.C. Briand
M. Newman
M.M. Lehman
P. Erdős
R. Albert
S. Jenkins
S. Valverde
S. Valverde
S.R. Chidamber
W.P. Stevens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

It is considered good software design practice to organize source code into modules and to favour within-module connections (cohesion) over between-module connections (coupling), leading to the oft-repeated maxim "low coupling/high cohesion". Prior research into network theory and its application to software systems has found evidence that many important properties in real software systems exhibit approximately scale-free structure, including coupling; researchers have claimed that such scale-free structures are ubiquitous. This implies that high coupling must be unavoidable, statistically speaking, apparently contradicting standard ideas about software structure. We present a model that leads to the simple predictions that approximately scale-free structures ought to arise both for between-module connectivity and overall connectivity, and not as the result of poor design or optimization shortcuts. These predictions are borne out by our large-scale empirical study. Hence we conclude that high coupling is not avoidable--and that this is in fact quite reasonable

Crossref

Research Commons@Waikato

Brane Interaction as the Origin of Inflation

Author: A. Sen
A. Sen
A. Sen
A.R. Liddle
A.T. Lee .
C. Giunti
C. Herdeiro
C.L. Bennett .
C.P. Burgess
C.P. Burgess .
D. Kutasov
D. Kutasov
E. Halyo
E. Witten
E.W. Kolb
G. Aldazabal
G.R. Dvali
Horace Stoica
J. Garcia-Bellido
J. Polchinski
M. Majumdar
N. Berkovits
N. Sakai
Nicholas Jones
P. Horava
R. Blumenhagen
R.C. Myers
S.-H.Henry Tye
T. Banks
T. Takayanagi
Publication venue: 'IOP Publishing'
Publication date: 01/01/2002
Field of study

We reanalyze brane inflation with brane-brane interactions at an angle, which include the special case of brane-anti-brane interaction. If nature is described by a stringy realization of the brane world scenario today (with arbitrary compactification), and if some additional branes were present in the early universe, we find that an inflationary epoch is generically quite natural, ending with a big bang when the last branes collide. In an interesting brane inflationary scenario suggested by generic string model-building, we use the density perturbation observed in the cosmic microwave background and the coupling unification to find that the string scale is comparable to the GUT scale.Comment: 28 pages, 8 figures, 2 tables, JHEP forma

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository

CERN Document Server